Poker Opponent Modeling ∗ Michel Salim and Paul Rohwer

نویسندگان

  • Michel Salim
  • Paul Rohwer
  • David Leake
  • Steven Bogaerts
چکیده

Utilizing resources and research from the University of Alberta Poker research group, we are investigating opponent modeling improvements. Currently, our simple poker bot plays online against instantiations of PokiBots, the poker machine created by the University of Alberta research group. After some decision rule building, our poker bot is competitive. Our next step is to build upon this research and investigate opponent modeling by experimenting with frequency distributions and multi-agent case-based reasoning. We feel this can provide both a good long-term and short-term measure of opponent play. The multi-agent case-base will track the current temperature of a player. Whether an opponent plays wildly after a demoralizing loss or an opponent plays too confidently after a thrilling victory. This knowledge will be used to either contract or expand the hand strength threshold used by our poker bot. Opponent Modeling Motivation A century ago, Mark Twain bemoaned the decline in the art of lying (Twain 1882). To lie, he felt, is a great virtue and lubricrant to social discourse. Mark Twain objected to the sad state into which the eternal art had fallen. Today, artificial intelligence and computer science is neglecting this great virtue. There are few who pursue the art of lying and computer science. The domain is difficult since computers are most often engineered to help people, not lie to them. Though people in general pay good money for others to lie for them. Lawyers and politicians, for example, enjoy a healthy reputation and good benefits from practicing the art of lying. Artificial intelligence has yet to explore and challenge these expert human liars. This paper explores machine learning through opponent modeling in a game where lying is critical to success. In poker, bluffing is to lie about one’s hand strength. Given a weak hand, bluffing signals the opposite to our opponents. The condition for success, the critical determinant for deciding to bluff, is correctly understanding the opponents. The expectation is an opponent will fold after one bluffs. For a poker player to make a bluff decision then ∗With much appreciation for David Leake, Steven Bogaerts, and The University of Alberta Poker Research Group Copyright c ©2005, Michel Salim and Paul Rohwer. All rights reserved. requires a correct estimation of opponent fold probability. The likelihood an opponent will fold when faced with our challenge, and in poker this bluff action is either a bet or a raise. Opponent modeling targets accurate predictions of future opponent actions. For poker, opponent modeling is difficult. It is a game of imperfect information, chance and incomplete knowledge. Contrast this with other games targeted by machine learning research. Chess has a game state known to each player. There is no risk and chance since playing the best move is always the best action. Other games where chance is present, like backgammon, still retain perfect information. And games that do retain both chance and imperfect information typically include just one opponent. These games, such as degenerate one-against-one poker games and RoShamBo (the kid’s game of rock, paper, scissors), do not have the additional complexity of play against multiple opponents. These difficulties have led researchers to conclude that ”opponent modeling in poker appears to have many of the characteristics of the most difficult problems in machine learning–noise, uncertainty, an unbounded number of dimensions to explore, and a need to quickly learn and generalize from relatively small number of heterogeneous training examples.” Hetergeneous is used because when a player folds, quits the game early, their cards, the missing link of poker’s imperfect nature, are not revealed to other players. What is the gain from opponent modeling study? Human poker players are good at understanding their opponent. The best human players are frequently able to form an accurate model from single data points. And while the best poker programs have successfully improved with opponent modeling, the program’s developers conclude there are numerous opportunities for improvement and that for a poker program to defeat the best human players opponent modeling is critical. In computer poker game-playing research, the University of Alberta Poker Research group leads a sparse field of researchers. They have developed an excellent poker-playing machine called Poki, building on their prior work developing a world champion checkers program, Chinook. Poki and the University of Alberta research group is focused on adaptive artificial intelligence. Key to Poki’s success thus far is adjustment to new information. Yet the deluge of information leads Poki to more slowly adjust to opponents. For example, in heads-up trials with an online poker legend their PokiBot successfully outplayed the human over 3500 hands. But then, the human changed course, refocusing after modeling the PokiBot. He changed his playing mode from overly aggressive to cagey passiveness; outplaying PokiBot over the next 3500 hands. In the group’s paper, ’The Challenge of Poker’, they conclude that the build-up of interia after thousands of observed data points can be detrimental if the player changes mood. Past success may have been due to static or a fixed-playing style of opponents. And they also conclude that it is difficult to track good players who consistently alter playing-style over relatively brief periods. This adjustment inertia helps explain why the human expert player proved superior to Poki. Our prescription to Poki’s adaptation inertia is to vary playing style, pursuing the emotion of the table by tracking the ebb and flow of the game. Tracking opponents with a long-term and short-term picture: keeping long-term measures of frequency and building short-term models for adaption. For us, a black-box neural net does not provide a simple enough understanding. By using a case-base and a case-base reasoning framework, we will understand the influences of our poker bot. The combination of long-term opponent characteristics and short-term opponent changes will target a table temperature. Is one player who usually is the air of passivity, folding early and often, suddenly playing like a maniac and aggressively betting over the short term? If we recognize this historical difference and alter our expectations quickly, then our poker bot should loosen a hand strength threshold requirement and let the maniacal opponent lose more! Texas Hold’em Poker Background Our opponent modeling experiments use a popular form of poker called Texas Hold’em. This is a multi-person game that shares a community of cards. Each player combines those community cards with two cards dealt face down at the beginning, attempting to develop the best hand as play proceeds through four stages of betting, bluffing, deceit and folding. To bet is putting money into the pot given to the winner, the player who last remains or has the best hand, the best combination of community cards and two private cards. If more than one player remains after the last stage, then all cards are revealed. Suddenly, the game exhibits perfect information and each player discovers the two private cards, termed hole cards. This discovery stage is the crucial point for machines to learn about opponents. Until the last stage, commonly termed the showdown, players attempt to infer there opponent’s hand strength by forming an accurate opponent model and applying it to the current game context. The game context changes as community cards are revealed at each stage. A general Texas Hold-em sequence then uses four stages termed: pre-flop, flop, turn and river. After hole cards, the two private cards, are dealt to each player, pre-flop commences with a round of betting. Each player chooses from a set of actions, to either (1) fold, quit the hand and the game, (2) check, remain, but decline the opportunity to bet, (3) call, put enough money into the pot to remain in the game, this occurs in the pre-flop and after an earlier player bets, (4) bet, put an amount into the pot and require all other players to match or raise it, and (5) raise, to raise the amount others must pay to remain after another player bets. The set of actions can be simplified into a action triple: fold, check/call, and bet/raise. Poki uses this simplification to calculate a probability action triple: p(fold), p(check/call), p(bet/raise), choosing from it an action to take. After a pre-flop stage, three community cards are revealed. This is called the flop and another round of betting ensues. At the next two stages, the turn and the river, one community card is revealed. This makes the total number of community cards to be five. If players remain after the betting round at the river, then a showdown occurs. The showdown is where hole cards are revealed and the best hand wins the pot. In the event of a tie, the pot is split evenly amongst the winners. Dimensions of expert poker play The University of Alberta Poker Research Group has delineated expert player dimensions. These are the factors that must be utilized to defeat expert players. First, an assessment of hand quality. This is a combination of game context: the hole cards, community cards, the number of active players, the potential hand improvement and the betting position of the player. The potential hand improvement is a hand strength factor, but more importantly provides a calculation for hand improvement. Bluffing is the ability to win given a weak hand. A hand that would lose in a perfect information world can still win if all other players fold. This means bluffing is the probability an opponent will fold, quit the game, if challenged with a bet or a raise. Unpredictability is a measure of play variance, but more importantly is a measure of how well a player can be modelled. And finally, opponent modeling determines an accurate prediction of opponent actions. Opponent modeling is crucial for bluff and fold decisions.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An Experimental Approach to Online Opponent Modeling in Texas Hold'em Poker

The game of Poker is an excellent test bed for studying opponent modeling methodologies applied to non-deterministic games with incomplete information. The most known Poker variant, Texas Hold'em Poker, combines simple rules with a huge amount of possible playing strategies. This paper is focused on developing algorithms for performing simple online opponent modeling in Texas Hold'em. The oppon...

متن کامل

University of Alberta expert poker agent: A survey

Games have always been a natural topic for Artificial Intelligence researchers to study and poker has proven to be a game that is both interesting and challenging. Part of the challenge of poker comes from the fact that it is a game of imperfect knowledge where multiple competing agents must deal with risk management, agent modeling, unreliable information and deception, much like decision-maki...

متن کامل

Active Sensing for Opponent Modeling in Poker

One approach to designing an intelligent agent capable of winning competitive games such as Texas hold’em poker is to use opponent modeling to learn about an opponent’s behavior, then exploit that knowledge to maximize long term winnings. However, opponent modeling can suffer from several problems, including slow convergence due to a lack of a priori knowledge, noisy or dynamic opponent behavio...

متن کامل

Opponent Modeling in Poker

Poker is an interesting test-bed for artificial intelligence research. It is a game of imperfect knowledge, where multiple competing agents must deal with risk management, agent modeling, unreliable information and deception, much like decision-making applications in the real world. Agent modeling is one of the most difficult problems in decision-making applications and in poker it is essential...

متن کامل

Building a Computer Poker Agent with Emphasis on Opponent Modeling

In this thesis, we present a computer agent for the game of no-limit Texas Hold'em Poker for two players. Poker is a partially observable, stochastic, multi-agent, sequential game. This combination of characteristics makes it a very challenging game to master for both human and computer players. We explore this problem from an opponent modeling perspective, using data mining to build a database...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2005